max rank | avg. rank | sentence |
---|---|---|
161 | 107.4286 | De grotste sôortn kunn styf lange leevn. |
315 | 156.2222 | In 't stad zelve weunn d'r 2 miljoen menschn. |
353 | 92.3636 | 't Volgn nog 10 doagn toet 't ende van 't joar. |
394 | 184.1250 | Je kwam azo keunienk Willem I van Iengeland. |
416 | 172.3636 | E krygt 5 joar, moar e moet moa êen joar zittn. |
465 | 172.4545 | Jan Frans blêef in Rome vo de reste van zyn leevn. |
483 | 158.6000 | Zyn derde zeune Karel is mo geboorn achter zyn dôod. |
505 | 142.2222 | Van toun of wierd der vele gebruuk van gemakt. |
507 | 106.5455 | 't Volgn nog 6 doagn toet 't ende van 't joar. |
541 | 195.2000 | De gemêente êt e bitje mêer of 1 000 inweuners. |
542 | 109.7273 | 't Volgn nog 15 doagn toet 't ende van 't joar. |
549 | 209.1250 | De grotste soortn kunn e meter lank kommn. |
596 | 151.0000 | Mo over da werk zyn ze nog nie hêel zeker. |
632 | 177.0000 | E volgde zyn voader op ton datn dien dôod wos. |
650 | 186.0526 | De mêeste sôortn hen êen sôorte plantn woa dan ze van eetn, toune eetn ze mêestol gin aar plantn. |
663 | 194.9091 | Ze zittn an mekoar vaste en kunn oopn en toe goan. |
669 | 139.7273 | Der volgn nog 30 doagn toet 't ende van 't joar. |
669 | 154.0909 | Der volgn nog 8 doagn toet 't ende van 't joar. |
685 | 333.8571 | Doadeure kwam 't groafschap Bourgondië ounder Vrankryk. |
710 | 202.7778 | In de 12e eeuwe is ter doa vele veranderd. |
728 | 170.3077 | De mêeste minsn weetn zelfs nie dan ze 't an 't doen zyn. |
731 | 226.2000 | Ip 't groundgebied weunn der e stik of 2000 menschn. |
737 | 215.4545 | Karel was nog mo tien joar oud os zyn voader stierf. |
737 | 176.5882 | Moa tien joar loater was 't were van dadde en der wierd were e nieuwe torre gebouwd. |
764 | 129.9091 | 't Volgn nog 40 doagn toet 't ende van 't joar. |
773 | 302.3750 | Ze kwoamn geirn tegoare vo muziek te speeln. |
774 | 186.8333 | 't Es doar êen van de twêe officiële toaln van 't land. |
779 | 276.1111 | Moar uutendelik kwam zyn zeune Filips keunienk van Spanje. |
786 | 173.3158 | Sommigte leevn mêer ip 't land en kommn allêne moa noa 't woater vor under eiers of te zettn. |
789 | 301.3333 | Ze leevn in groepn en loopn assan achter mekoar. |
The maximum word rank of a sentence is by definition the rank of the rarest word in the sentence. If it is low, all words in the sentence are of high frequency. For this reason the table of the sentences with least maximum word number might be of interest. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The over all distribution of the maximum rank in all sentences of the corpus is shown in a diagram with log-scaled x-axis.
The sentences in the table described above are of interest because they are usually easy to understand. The distribution may give insights into the corpus and may give parameters for language comparison.
While the distribution might be deduced from a small corpus, the sentences in the table are rare and a large corpus will give more impressive results.
Table data:
select max(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m limit 30;
Distribution data;
select m, count(*) from (select 100* round((max(w_id)-100)/100) as m from sentences s, inv_w i where s.s_id=i.s_id and i.w_id>100 group by s.s_id) aa group by m;
Explain the distribution, especially the increase in its right part.
4.5.2.2 Average word rank in sentence
4.5.2.3 Sentences consisting of many low frequency words I
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II